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Abstract 

Understanding  the  mechanisms  of  learning  is  one  of  the  cen¬ 
tral  questions  of  Cognitive  Science.  Recently  Marcus  et  al. 
showed  that  seven-month-old  infants  can  learn  to  recognize 
regularities  in  simple  language-like  stimuli.  Marcus  proposed 
that  these  results  could  not  be  modeled  via  existing  connec- 
tionist  systems,  and  that  such  learning  requires  infants  to  be 
constructing  rules  containing  algebraic  variables.  This  paper 
proposes  a  third  possibility:  that  such  learning  can  be  ex¬ 
plained  via  structural  alignment  processes  operating  over 
structured  representations.  We  demonstrate  the  plausibility  of 
this  approach  by  describing  a  simulation,  built  out  of  previ¬ 
ously  tested  models  of  symbolic  similarity  processing,  that 
models  the  Marcus  data.  Unlike  existing  connectionist  simu¬ 
lations,  our  model  learns  within  the  span  of  stimuli  presented 
to  the  infants  and  does  not  require  supervision.  It  can  handle 
input  with  and  without  noise.  Contrary  to  Marcus’  proposal, 
our  model  does  not  require  the  introduction  of  variables.  It 
incrementally  abstracts  structural  regularities,  which  do  not 
need  to  be  fully  abstract  rules  for  the  phenomenon  to  appear. 
Our  model  also  proposes  a  processing  explanation  for  why  in¬ 
fants  attend  longer  to  the  novel  stimuli.  We  describe  our 
model  and  the  simulation  results  and  discuss  the  role  of  struc¬ 
tural  alignment  in  the  development  of  abstract  patterns  and 
rules. 

Introduction 

Understanding  the  mechanisms  of  learning  is  one  of  the  cen¬ 
tral  questions  of  cognitive  science.  Recent  studies  (Gomez  & 
Gerken,  1999;  Marcus,  Vijayan,  Rao  &  Vishton,  1999)  have 
shown  that  showed  that  infants  as  young  as  seven  months 
can  process  simple  language-like  stimuli  and  build  generali¬ 
zations  sufficient  to  distinguish  familiar  from  unfamiliar 
patterns  in  novel  test  stimuli.  In  Marcus  et  al’s  study,  the 
stimuli  were  simple  ‘sentences,’  each  consisting  of  three 
nonsense  consonant-vowel  ‘words’  (e.g.,  ‘ba’,  ‘go’,  ‘ka’). 
All  habituation  stimuli  had  a  shared  grammar,  either  ABA  or 
ABB.  In  ABA-type  stimuli  the  first  and  the  third  word  are 
the  same:  e.g,  ‘pa-ti-pa.’  In  ABB-type  stimuli  the  second 
and  the  third  word  are  identical:  e.g.,  Te-di-di’.  The  infants 
were  habituated  on  16  such  sentences,  with  three  repetitions 
for  each  sentence.  The  infants  were  then  tested  on  a  different 


set  of  sentences  that  consisted  of  entirely  new  words.  Half  of 
the  test  stimuli  followed  the  same  grammar  as  in  the  habitua¬ 
tion  phase;  the  other  half  followed  the  non-trained  grammar. 
Marcus  et  al.  found  that  the  infants  dishabituated  signifi¬ 
cantly  more  often  to  sentences  in  the  non-trained  pattern 
than  to  sentences  in  the  trained  pattern. 

Based  on  these  findings  Marcus  et  al.  proposed  that  in¬ 
fants  had  learned  abstract  algebraic  rules.  They  noted  that 
these  results  cannot  be  accounted  for  solely  by  statistical 
mechanisms  that  track  transitional  probabilities.  They  fur¬ 
ther  argue  that  their  results  challenge  connectionist  models 
of  human  learning  that  use  similar  information,  on  two 
grounds:  (1)  the  infants  learn  in  many  fewer  trials  than  are 
typically  needed  by  connectionist  learning  systems;  (2)  more 
importantly,  the  infants  learn  without  feedback.  In  particular, 
Marcus  et  al.  demonstrated  that  a  simple  recurrent  network 
with  the  same  input  stimuli  could  not  model  this  learning 
task. 

In  response,  several  connectionist  models  have  attempted 
to  simulate  these  findings.  Unfortunately,  all  of  them  to  date 
include  extra  assumptions  that  make  them  a  relatively  poor 
fit  for  the  Marcus  et  al  experiment.  For  example,  Elman 
(1999;  Seidenberg  &  Elman,  1999)  use  massive  pre -training 
(50,000  trials)  to  teach  the  network  the  individual  stimuli. 
More  importantly,  they  turn  the  infants’  unsupervised  learn¬ 
ing  task  into  a  supervised  learning  task  by  providing  the 
network  with  external  training  signals.  Other  models  tailored 
to  capture  the  data  of  the  study  seem  unlikely  to  be  applica¬ 
ble  to  other  similar  cognitive  tasks  (Altmann  &  Dienes, 
1999).  Using  a  localist  temporal  binding  scheme,  Shastri 
and  Chang  (1999)  model  the  infant  results  without  pretrain¬ 
ing  and  without  supervision,  but  still  require  an  order  of 
magnitude  more  exposure  to  the  stimuli  than  the  infants  re¬ 
ceived. 

We  propose  a  third  alternative.  There  is  evidence  that 
structural  alignment  processes  operating  over  symbolic 
structured  representations  participate  in  a  number  of  cogni¬ 
tive  processes,  including  analogy  and  similarity  (Gentner, 
1983),  categorization  (Markman  &  Gentner,  1993),  detec¬ 
tion  of  symmetry  and  regularity  (Ferguson,  1994),  and  learn- 
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ing  and  transfer  (Gentner  &  Medina,  1998).  Although  these 
representations  and  processes  are  symbolic,  they  do  not  need 
to  be  rule-like,  nor  need  they  involve  variables.  Instead,  we 
view  the  notion  of  correspondence  in  structural  alignment  as 
an  interesting  cognitive  precursor  to  the  notion  of  variable 
binding1.  Correspondences  between  structured  representa¬ 
tions  can  support  the  projection  of  inferences,  as  the  analogy 
literature  shows,  and  therefore  a  symbolic  system  can  draw 
inferences  about  novel  situations  even  without  having  con¬ 
structed  rules.  Moreover,  as  discussed  below,  comparison 
can  be  used  to  construct  conservative  generalizations. 
Across  a  series  of  items  with  common  structure  such  a  proc¬ 
ess  of  progressive  abstraction  can  eventually  lead  to  abstract 
rule-like  knowledge.  The  attainment  of  rules,  in  those  cases 
where  it  occurs,  is  the  result  of  a  gradual  process.  As  we 
will  show,  symbolic  descriptions  can  be  used  with  structural 
alignment  to  model  learning  that  is  initially  conservative,  but 
which  occurs  fast  enough  to  be  psychologically  realistic. 

We  first  describe  our  simulation  model  of  the  Marcus  et  al 
task,  which  uses  a  simple  combination  of  preexisting  simula¬ 
tion  modules,  i.e.,  SME,  MAGI,  and  SEQL.  All  of  these 
modules  have  been  independently  tested  against  psychologi¬ 
cal  data  and  independently  motivated  in  prior  modeling 
work.  With  the  exception  of  domain-specific  encoding  pro¬ 
cedures,  no  new  processing  components  were  created  for 
this  task.  We  then  describe  the  results  of  our  simulation  of 
the  Marcus  et  al  data,  showing  that  our  simulation  can  learn 
the  concepts  within  the  number  of  trials  that  the  infants  had, 
without  supervision  and  without  pre-learning.  We  also  show 
that  the  simulation  can  exhibit  the  same  results  with  noisy 
input  data.  Finally,  we  discuss  some  of  the  implications  of 
the  symbolic  similarity  approach  for  models  of  cognitive 
processing. 

Modeling  infant  learning  via  structural 
alignment 

A  psychological  model  of  the  infants’  learning  must  in¬ 
clude  the  kind  of  input,  the  way  the  infants  are  assumed  to 
encode  the  individual  sentences,  and  the  processes  by  which 
they  generalize  across  the  sentences.  The  architecture  of  our 
simulation  is  shown  in  Figure  1.  We  first  describe  our  as¬ 
sumptions  concerning  the  infants’  processing  capacities. 
Then  we  describe  each  component  in  turn. 

Processing  Assumptions:  We  assume  that  infants  can 
represent  the  temporal  order  within  the  sentences  (Saffran, 
Aslin  &  Newport,  1996).  We  further  assume  that  the  infants 
notice  and  encode  identities  within  the  sentences:  for  exam¬ 
ple,  the  fact  that  the  last  two  elements  match  in  an  ABB  sen¬ 
tence.  This  assumption  is  consistent  with  evidence  that  hu¬ 
man  infants,  as  well  as  with  studies  of  nonhuman  primates 
(Oden  et  al,  in  press),  can  detect  identities.  We  also  assume 
that  infants  can  detect  similarities  between  sequentially  pre¬ 
sented  stimuli,  consistent  with  studies  of  infant  habituation, 
which  demonstrate  that  infants  respond  to  sequential  same¬ 
ness  (e.g.,  Baillargeon,  1994). 

1  That  structure-mapping  algorithm  neither  subsumes,  nor  is 
subsumed  by,  traditional  pattern  matching  such  as  unification  is 
shown  in  Falkenhainer,  Forbus,  &  Gentner  (1988). 


Sentence  Encoding 


Figure  1:  Simulation  Architecture 

Input  stimuli:  To  make  our  simulation  comparable  with 
others,  we  use  a  representation  similar  to  that  of  Elman 
(1999),  namely,  Plunkett  &  Marchman’s  (1993)  distinctive 
feature  notation.  Each  word  has  twelve  phonetic  features, 
which  can  be  either  present  or  absent.  The  presence  or  ab¬ 
sence  of  each  feature  for  each  word  is  encoded  by  symbolic 
assertions.  If  feature  n  is  present  for  word  w,  the  assertion 
(Rn  w)  is  included  in  the  stimulus,  and  if  absent,  the  asser¬ 
tion  (S n  w )  is  included.  Thus  the  acoustic  features  of  each 
word  are  encoded  as  twelve  attribute  statements. 

We  modeled  the  Marcus  et  al  experiment  both  without 
noise  (Experiment  1)  and  with  noise  (Experiment  2).  Marcus 
et  al.  used  a  speech  synthesizer  to  control  the  pronunciation 
of  the  stimuli,  but  while  this  reduces  variability,  it  cannot 
eliminate  the  possibility  that  the  infant  might  encode  some¬ 
thing  incorrectly. 


Temporal  encoding:  We  assume  that  the  infant  encodes 
the  temporal  sequence  of  the  words  in  a  sentence  in  two 
ways.  First,  each  incoming  word  has  an  attribute  associated 
with  it,  corresponding  to  the  order  in  which  it  appears  (i.e., 
FIRST,  SECOND,  or  THIRD).  We  further  assume  that  the 
infant  encodes  temporal  relationships  between  the  words  in 
a  sentence:;  to  code  this,  an  AFTER  relation  is  added  be¬ 
tween  pairs  of  words  in  the  same  sentence  indicating  their 
relative  temporal  ordering.  The  particular  labels  used  in  this 
encoding  step  are  irrelevant  -  there  are  no  rules  in  the  sys¬ 
tem  that  operate  on  these  specific  predicates  -  the  point  is 
simply  that  infants  are  encoding  the  temporal  order  of  words 
within  sentences. 

Regularity  Encoding:  We  assume  that  the  infants  notice 
and  encode  identities  within  the  sentences:  for  example,  the 
fact  that  the  last  two  elements  match  in  an  ABB  sentence. 
Thus  the  simulation  must  incorporate  a  process  that  detects 
when  words  are  the  same.  We  use  the  MAGI  model  of  sym¬ 
metry  and  regularity  detection  (Ferguson,  1994)  to  auto¬ 
matically  compute  these  relationships.  MAGI  treats  symme¬ 
try  as  a  kind  of  self-similarity,  using  a  modified  version  of 
structure-mapping’s  constraints  to  guide  the  self-alignment 
process.  MAGI  has  been  successfully  used  with  inputs  rang¬ 
ing  from  stories  to  mathematical  equations  to  visual  stimuli. 


and  it  has  done  well  at  modeling  certain  aspects  of  visual 
symmetry,  including  making  new  predictions  (Ferguson  et  al 
1996).  Here  MAGI  is  used  on  the  collection  of  words  in  a 
sentence.  For  any  pair  of  words  wl  and  w2  that  MAGI  finds 
sufficiently  similar,  this  module  asserts  (SIM  wl  w2),  and  a 
DIFF  statement  for  every  other  pair  of  words  in  the  sen¬ 
tence.  (If  MAGI  does  not  find  any  pairs  similar,  DIFF 
statements  are  asserted  for  every  pair  of  words.)  This  mod¬ 
ule  also  asserts  (GROUP  wl  w2 )  for  pairs  of  similar  words, 
to  mark  that  they  form  a  substructure  in  the  stimulus,  and 
adds  DIFF  statements  between  groups  and  words  not  in  the 
group.  This  use  of  MAGI  is  an  example  of  what  Ferguson 
(1994,  in  preparation)  calls  analogical  encoding. 

SEQL 

Once  each  sentence  is  encoded,  we  assume  infants  can  de¬ 
tect  the  similarities  between  sequential  pairs  of  sentences. 
The  detection  of  structurally  parallel  patterns  across  a  se¬ 
quence  of  examples  is  modeled  by  SEQL  (Skorstad,  Gentner 
&  Medin,  1988;  Kuehne,  Forbus,  Gentner  &  Quinn,  2000),  a 
model  of  the  process  of  category  learning  from  examples. 
SEQL  constructs  category  descriptions  via  incremental  ab¬ 
straction.  That  is,  the  representation  of  a  category  is  a  struc¬ 
tured  description  that  has  been  generated  by  successive 
comparison  with  incoming  exemplars.  If  the  new  exemplar 
and  the  category  are  sufficiently  similar,  the  category  de¬ 
scription  is  modified  to  be  their  intersection  —  i.e.,  the  com¬ 
monalities  computed  via  structural  alignment  by  a  generali¬ 
zation  algorithm.  If  the  new  exemplar  is  not  sufficiently 
similar,  it  is  stored  separately  and  may  later  be  used  as  the 
seed  of  a  new  category. 

The  structural  alignment  process  is  implemented  via 
SME,  (Falkenhainer  et  al  1988;  Forbus  et  al  1994)  a  cogni¬ 
tive  simulation  of  analogical  matching.  Here  the  base  de¬ 
scription  is  a  category  description,  and  the  target  description 
is  the  new  exemplar.  The  structural  alignments  that  SME 
computes  are  used  in  three  ways  by  SEQL.  First,  the  nu¬ 
merical  structural  evaluation  score  it  computes2  is  used  as  a 
similarity  metric,  a  numerical  measure  for  deciding  whether 
or  not  two  descriptions  are  sufficiently  similar.  Second,  the 
candidate  inferences  it  computes  serve  as  a  model  for  cate- 
gory-based  induction  (c.f.  Blok  &  Gentner,  2000;  Forbus, 
Gentner,  Everett,  &  Wu,  1997).  Third,  the  correspondences 
in  the  best  mapping  SME  produces  serves  as  the  basis  for 
SEQL’s  generalization  algorithm. 

SEQL  maintains  a  set  of  generalizations  and  a  set  of  sin¬ 
gular  exemplars.  When  a  new  exemplar  comes  in,  it  is  com¬ 
pared  against  existing  generalizations  to  see  if  it  can  be  as¬ 
similated  into  one  of  them.  Otherwise,  it  is  compared  with 
the  stored  exemplars  to  see  if  a  new  generalization  can  be 
formed.  If  it  is  insufficiently  similar  to  both  the  generaliza¬ 
tions  and  the  stored  exemplars,  it  is  stored  as  an  exemplar 
itself. 

SEQL  begins  with  no  generalizations;  it  simply  stores  its 
first  exemplar.  If  the  next  exemplar  is  sufficiently  close  to 
the  first,  their  overlap  is  stored  as  the  first  generalization.  A 

2  Although  SME  can  compute  multiple  mappings,  we  use  the 
structural  evaluation  score  of  the  best  mapping,  normalized  by  the 
size  of  the  base  description. 


generalization  consists  of  the  overlap  between  the  two  input 
descriptions:  that  is,  the  shared  structure  found  by  align¬ 
ment.  Thus  generalizations  are  structured  descriptions  of  the 
same  type  as  the  input  descriptions,  although  containing 
fewer  specific  features.  If  a  new  exemplar  is  sufficiently 
similar  to  a  generalization  (as  determined  comparing  the 
structural  evaluation  score  to  a  set  threshold),  then  (a)  the 
generalization  is  updated  by  retaining  only  the  overlapping 
description  that  forms  the  alignment  between  the  generaliza¬ 
tion  and  the  exemplar;  and  (b)  candidate  inferences  are  pro¬ 
jected  from  the  generalization  to  the  exemplar.  Non¬ 
overlapping  aspects  of  a  description  (e.g.,  phonetic  features 
or  relations  that  aren’t  shared)  are  thus  “worn  away”  with 
each  new  assimilated  description.  (The  threshold  that  de¬ 
termines  when  descriptions  are  sufficiently  similar  to  be 
assimilated  helps  prevent  descriptions  from  diminishing  into 
vacuity.) 

Returning  now  to  the  infant  studies,  we  assume  that  babies 
are  carrying  out  an  ongoing  process  of  comparing  and  align¬ 
ing  the  incoming  exemplars  with  an  evolving  generalization. 
We  further  assume  that  the  relational  candidate  inferences 
from  the  general  pattern  to  a  new  exemplar  represent  expec¬ 
tations  on  part  of  the  infant.3  When  these  expectations  are 
violated  by  an  incoming  stimulus  that  does  not  fit  the  gener¬ 
alized  pattern  (e.g.,  an  ABB  test  sentence  after  the  ABA 
generalization  has  been  formed),  we  assume  the  infant  re¬ 
quires  extra  time  to  process  the  inconsistent  stimulus. 

Simulation  Experiments 

In  both  experiments,  we  followed  the  procedure  of  Mar¬ 
cus  et  al.  Each  stimulus  was  a  simple  three -word  sentence, 
encoded  as  described  earlier.  There  were  two  sets  of  train¬ 
ing  stimuli,  one  following  the  ABA  pattern  and  one  follow¬ 
ing  the  ABB  pattern.  The  training  stimuli  were  (ABA)  de- 
di-de,  de-je-de,  de-li-de,  de-we-de,  ji-di-ji,  ji-je-ji,  ji-li-ji,  ji- 
we-ji,  le-di-le,  le-je-le,  le-li-le,  le-we-le,  wi-di-wi,  wi-je-wi, 
wi-li-wi,  wi-we-wi  and  (ABB)  de-di-di,  de-je-je,  de-li-li,  de- 
we-we,  ji-di-di,  ji-je-je,  ji-li-li,  ji-we-we,  le-di-di,  le-je-je,  le- 
li-li,  le-we-we,  wi-di-di,  wi-je-je,  wi-li-li,  wi-we-we.  The 
test  stimuli  in  both  experiments  were  four  descriptions  rep¬ 
resenting  two  novel  ABA-type  (ba-po-ba,  ko-ga-ko)  and  two 
novel  ABB-type  sentences  (ba-po-po,  ko-ga-ga).  The 
threshold  value  for  SEQL  was  set  to  0.85  in  both  experi¬ 
ments. 

Experiment  1 

This  experiment  is  most  comparable  to  previous  simula¬ 
tion  models  of  the  phenomena,  in  that  we  assume  noise-free 
encoding  of  the  stimuli.  A  simulation  run  consists  of  expos¬ 
ing  SEQL  to  all  of  the  stimuli  from  a  particular  training  set 
(either  ABA  or  ABB)  once  and  then  seeing  the  response 
given  the  four  test  sentences.  To  avoid  possible  biasing  due 
to  sequence  effects  (See  Kuehne  et  al.,  2000),  20  simulation 
runs  were  made  for  each  training  set  using  different  random 


3  SME  can  also  produce  attribute-level  candidate  inferences,  and 
does  so  on  these  stimuli.  We  assume  that,  since  these  inferences 
concern  directly  perceivable  features,  testing  them  takes  very  little 
time. 


orders.  Identical  match  score  and  relational  candidate  infer¬ 
ences  were  produced  for  all  sequences  with  a  given  stimulus 
set.  In  each  case,  SEQL  produced  a  single  generalization 
during  the  learning  phase.  For  the  test  phase  we  used  encod¬ 
ings  of  the  corresponding  stimuli  used  with  infants,  as  noted 
above.  Tables  la  and  lb  show  the  results  of  this  series  for 
two  generalizations  paired  against  the  four  test  sentences. 


Table  la:  ABA  training  stimuli 


Test 

Match 

Candidate 

Stimulus 

Score 

Inferences 

Ba-po-ba 

0.658 

None 

Ko-ga-ko 

0.689 

None 

Ba-po-po 

0.486 

(DIFFpol  bal) 

(DIFF  pol  po2) 

(SIM  bal  po2) 

Ko-ga-ga 

0.455 

(DIFF  gal  kol) 

(DIFF  gal  ga2) 

(SIM  kol  gal) 

Table  lb:  ABB 

training  stimuli 

Test 

Match 

Candidate 

Stimulus 

Score 

Inferences 

Ba-po-ba 

0.328 

(SIM  pol  ba2) 

(DIFF  bal  (GROUP  pol 
ba2)) 

Ko-ga-ko 

0.350 

(SIM  gal  ko2) 

(DIFF  kol  (GROUP  gal 
ko2)) 

Ba-po-po 

0.776 

None 

Ko-ga-ga 

0.753 

None 

The  in-grammar  (bold)  and  out-of-grammar  (plain  text) 
matches  show  clear  differences  in  their  match  scores.  In¬ 
grammar  matches  are  above  0.64  and  do  not  generate  rela¬ 
tional  candidate  inferences.  Out-of-grammar  matches  have 
match  scores  below  0.5,  and  lead  to  relational  candidate 
inferences.  Thus  out-of-grammar  test  sentences  lead  to 
longer  looking  behavior,  as  predicted. 

Experiment  2 

As  noted  earlier,  we  believe  that  noise-free  stimulus  en¬ 
codings  are  unrealistic.  Consequently,  we  used  the  same 
procedure  as  Experiment  1,  but  this  time  introducing  noise 
into  the  representations  for  the  training  and  test  stimuli.  For 
each  sentence,  one  of  the  words  was  randomly  picked,  and 
one  of  its  attributes  (also  chosen  at  random)  was  dropped  or 
flipped,  with  the  rest  of  its  description  being  unchanged. 
Such  changes  can  be  significant:  for  example,  flipping  a 
single  phonetic  feature  turns  the  word  ‘de’  into  the  word 
‘di’.  Again,  20  simulation  runs  were  made  for  each  training 
set  using  different  random  orders.  Naturally  the  match 
scores  and,  to  a  lesser  degree,  the  generated  candidate  infer¬ 
ences,  did  vary  across  the  individual  runs.  Tables  2a  and  2b 
show  the  results.  The  scores  were  averaged  over  all  20  runs. 

Although  the  noise  affected  the  details  of  the  computa¬ 
tions,  the  overall  pattern  of  results  remains  the  same.  The 
in-grammar  (bold)  match  scores  are  far  higher  than  the  out- 
of-grammar  (plain  text)  scores;  and  the  out-of-grammar 


stimuli  produce  relational  candidate  inferences  while  the  in¬ 
grammar  stimuli  do  not. 


Table  2a:  ABA  training  stimuli 


Test 

Stimulus 

Average  Match 
Score 

Candidate 

Inferences 

Min,  Average,  Max 

ba-po-ba 

0.647 

0,  0,  0 

ko-ga-ko 

0.682 

0,  0,  0 

ba-po-po 

0.435 

2,  2.45,  3 

ko-ga-ga 

0.395 

2 , 2.55,  3 

Table  2b:  ABB  training  stimuli 


Test 

Stimulus 

Match 

Score 

Candidate 

Inferences 

Min,  Average,  Max 

ba-po-ba 

0.339 

2,  2,  2 

ko-ga-ko 

0.352 

2,  2.05,  3 

ba-po-po 

0.805 

0,  0,  0 

ko-ga-ga 

0.783 

0,  0,  0 

Comparison  with  other  models 

The  results  of  Marcus  et  al.  (1999)  have  sparked  an  active 
debate  focused  on  two  issues:  (1)  Can  current  connectionist 
models  (e.g.,  simple  recurrent  networks)  model  these  re¬ 
sults?  (2)  Do  infants  generate  abstract  rules  that  include 
variables? 

Regarding  the  adequacy  of  simple  recurrent  networks, 
Marcus  et  al.  state  “Such  networks  can  simulate  knowledge 
of  grammatical  rules  only  by  being  consequently  trained  on 
all  items  to  which  they  apply;  consequently,  such  mecha¬ 
nisms  cannot  account  for  how  humans  generalize  rules  to 
new  items  that  do  not  overlap  with  the  items  that  appeared  in 
the  training.”  Elman’s  (1999)  response  describes  his  use  of 
a  simple  recurrent  network  to  model  this  task.  Elman’s 
model  requires  tens  of  thousands  of  training  trials  on  the 
individual  syllables,  and  treats  the  problem  as  a  supervised 
learning  task,  unlike  the  task  facing  the  infants.  By  contrast, 
our  simulation  handles  the  learning  task  unsupervised,  and 
produces  human-like  results  with  only  exposure  to  stimuli 
equivalent  to  that  given  to  the  infants.  Moreover,  our  model 
also  continues  to  work  with  noisy  data,  something  not  true  of 
any  other  published  model  of  this  phenomenon  that  we  know 
of. 

The  learning  in  our  model  is  due  to  the  “wearing  away”  of 
non-identical  phonetic  attributes  through  subsequent  com¬ 
parisons.  Although  SEQL’s  learning  proceeds  faster  than 
connectionist  models,  it  is  still  slower  than  systems  that  gen¬ 
erate  abstractions  immediately  (e.g.,  explanation-based 
learning  (Delong  &  Mooney,  1986)).  In  SEQL’s  progres¬ 
sive  alignment  algorithm,  the  entities  in  the  generalizations 
lose  their  concrete  attributes  across  multiple  comparisons, 
leaving  the  relational  pattern  of  each  grammar  as  the  domi¬ 
nant  force  in  the  generalization  only  after  a  reasonable  num- 


ber  of  varied  examples  are  seen.4  There  is  considerable  evi¬ 
dence  for  this  kind  of  conservative  learning  (Forbus  & 
Gentner,  1986;  Medin  &  Ross,  1989). 

Turning  to  the  second  issue,  whether  infants  have  vari¬ 
ables  and  generate  abstract  rules,  Marcus  et  al  (1999)  claims 
“[I]nfants  extract  abstract  algebra-like  rules  that  represents 
relationships  between  placeholders  (variables),  such  as  ‘the 
first  item  X  is  the  same  as  the  third  item  Y,’  or  more  gener¬ 
ally  that  ‘item  I  is  the  same  as  item  J.’”  But  our  simulation 
does  not  introduce  variables,  in  the  sense  commonly  used  in 
mathematics  or  logic.  The  generalizations  constructed  by 
SEQL  do  indeed  include  relational  patterns  that  survive  re¬ 
peated  comparisons  because  they  are  shared  across  the  in¬ 
grammar  exemplars.  Furthermore,  the  entities  (words)  in  the 
generalizations  have  many  fewer  features  than  the  original 
words,  as  a  result  of  the  wearing  away  of  features  in  succes¬ 
sive  comparisons.  One  could  consider  these  patterns  as  a 
form  of  psychological  rule,  as  proposed  by  Gentner  and 
Medina  (1998),  with  the  proviso  that  the  elements  in  the  rule 
are  not  fully  abstract  variables,  although  they  might  asymp¬ 
totically  approach  pure  variables. 

Discussion 

This  paper  proposes  a  third  kind  of  explanation  for  the  in¬ 
fant  learning  phenomena  of  Marcus  et  al  (1999):  incremental 
abstraction  of  symbolic  descriptions  via  stmctural  align¬ 
ment.  We  believe  our  explanation  is  currently  the  best  one 
for  three  reasons.  First,  it  models  the  infant  data  with  fewer 
extra  concessions  than  previously  published  models  (i.e.,  no 
pre-training,  no  supervision,  and  noisy  data).  Second,  the 
processes  we  postulate  are  cognitively  general;  they  apply  to 
a  large  set  of  phenomena.  Third,  the  abstraction  processes 
we  propose  are  consistent  with  research  demonstrating  that 
human  learning  is  initially  conservative  (Brooks,  1987;  For¬ 
bus  &  Gentner,  1986;  Medin  &  Ross,  1989).  Interestingly, 
there  is  ongoing  research  in  developing  symbolic  connec- 
tionist  models  consistent  with  these  processes  (e.g.,  Holyoak 
&  Hummel,  1997). 

Many  issues  remain  to  be  explored.  For  example,  al¬ 
though  our  system  does  not  introduce  variables  in  its  gener¬ 
alization  process,  there  is  a  sense  in  which  the  entities  in  the 
generalization  are  on  their  way  to  becoming  variables.  Gent¬ 
ner  and  Medina  (1998)  have  proposed  that  the  process  of 
progressive  alignment  can  lead  to  rules.  They  further  sug¬ 
gested  that  the  application  of  rules  to  instances  can  be  ac¬ 
complished  using  the  same  general  processes  of  structural 
alignment  and  projection  that  are  used  in  analogy.  The  dif¬ 
ference  is  that  the  base  domain  is  an  abstraction,  the  entities 
are  ‘dummies’  with  no  features  to  either  help  or  impede  the 
match  with  the  specific  entities  in  the  exemplar.  Another 
issue  concerns  the  incorporation  of  statistical  notions  in 
SEQL.  Although  SEQL  is  to  a  certain  degree  noise-resistant. 


4  SEQL  learns  with  only  one  exposure  to  the  16  learning  sen¬ 
tences,  whereas  Marcus’s  infants  received  three  exposures  for  each 
sentence.  It  is  possible  that  the  infants  would  have  learned  with 
only  one  pass;  however  it  is  also  possible  that  the  infants  were  less 
consistent  in  detecting  the  similarities  than  our  simulation  with  its 
current  parameters. 


we  suspect  that  to  model  large-scale  learning,  it  will  need  to 
keep  track  of  more  statistical  information  than  it  does  cur¬ 
rently,  so  that  properties  wear  away  more  slowly. 

We  note  that  it  is  common  to  conflate  symbolic  process¬ 
ing  with  rule-based  behavior,  and  parallel  processing  with 
connectionist  models.  The  model  described  here  is  sym¬ 
bolic,  but  it  need  not  involve  variables  or  rules.  Further,  it 
involves  extensive  parallel  processing  (most  of  SME  and 
MAGI’ s  computations  are  parallel).  Given  the  complexity 
of  the  phenomena,  such  conflations  seem  unwise. 

The  debates  stirred  by  the  Marcus  et  al.  results  bear  on  a 
critical  issue  in  human  learning  and  development:  namely, 
what  knowledge  or  mechanisms  must  be  assumed  to  account 
for  the  rapid  and  powerful  achievements  demonstrated  by 
infants  in  both  cognition  and  language.  Our  results  suggest 
that  the  general  learning  mechanism  of  structure-mapping 
theory  may  go  a  long  way  in  accounting  for  these 
accomplishments. 
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